skip to main content


Search for: All records

Creators/Authors contains: "Parhi, Keshab K."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This paper addresses the design of a partly-parallel cascaded FFT-IFFT architecture that does not require any intermediate buffer. Folding can be used to design partly-parallel architectures for FFT and IFFT. While many cascaded FFT-IFFT architectures can be designed using various folding sets for the FFT and the IFFT, for a specified folded FFT architecture, there exists a unique folding set to design the IFFT architecture that does not require an intermediate buffer. Such a folding set is designed by processing the output of the FFT as soon as possible (ASAP) in the folded IFFT. Elimination of the intermediate buffer reduces latency and saves area. The proposed approach is also extended to interleaved processing of multi-channel time-series. The proposed FFT-IFFT cascade architecture saves about N/2 memory elements and N/4 clock cycles of latency compared to a design with identical folding sets. For the 2-interleaved FFT-IFFT cascade, the memory and latency savings are, respectively, N/2 units and N/2 clock cycles, compared to a design with identical folding sets. 
    more » « less
    Free, publicly-accessible full text available April 14, 2025
  2. Quantum computing is an emerging technology that has the potential to achieve exponential speedups over their classical counterparts. To achieve quantum advantage, quantum principles are being applied to fields such as communications, information processing, and artificial intelligence. However, quantum computers face a fundamental issue since quantum bits are extremely noisy and prone to decoherence. Keeping qubits error free is one of the most important steps towards reliable quantum computing. Different stabilizer codes for quantum error correction have been proposed in past decades and several methods have been proposed to import classical error correcting codes to the quantum domain. Design of encoding and decoding circuits for the stabilizer codes have also been proposed. Optimization of these circuits in terms of the number of gates is critical for reliability of these circuits. In this paper, we propose a procedure for optimization of encoder circuits for stabilizer codes. Using the proposed method, we optimize the encoder circuit in terms of the number of 2-qubit gates used. The proposed optimized eight-qubit encoder uses 18 CNOT gates and 4 Hadamard gates, as compared to 14 single qubit gates, 33 2-qubit gates, and 6 CCNOT gates in a prior work. The encoder and decoder circuits are verified using IBM Qiskit. We also present encoder circuits for the Steane code and a 13-qubit code, that are optimized with respect to the number of gates used, leading to a reduction in number of CNOT gates by 1 and 8, respectively. 
    more » « less
    Free, publicly-accessible full text available April 17, 2025
  3. Quantum computers have the potential to provide exponential speedups over their classical counterparts. Quantum principles are being applied to fields such as communications, information processing, and artificial intelligence to achieve quantum advantage. However, quantum bits are extremely noisy and prone to decoherence. Thus, keeping the qubits error free is extremely important toward reliable quantum computing. Quantum error correcting codes have been studied for several decades and methods have been proposed to import classical error correcting codes to the quantum domain. Along with the exploration into novel and more efficient quantum error correction codes, it is also essential to design circuits for practical realization of these codes. This paper serves as a tutorial on designing and simulating quantum encoder and decoder circuits for stabilizer codes. We first describe Shor’s 9-qubit code which was the first quantum error correcting code. We discuss the stabilizer formalism along with the design of encoding and decoding circuits for stabilizer codes such as the five-qubit code and Steane code. We also design nearest neighbor compliant circuits for the above codes. The circuits were simulated and verified using IBM Qiskit. 
    more » « less
    Free, publicly-accessible full text available March 5, 2025
  4. This tutorial aims to establish connections between polynomial modular multiplication over a ring to circular convolution and the discrete Fourier transform (DFT). The main goal is to extend the well-known theory of the DFT in signal processing (SP) to other applications involving polynomials in a ring, such as homomorphic encryption (HE). 
    more » « less
    Free, publicly-accessible full text available January 1, 2025
  5. Data privacy has become a significant concern due to the rapid development of cloud services, Internet of Things, edge devices, and other applications. Homomorphic encryption (HE) addresses the issue by enabling computations to be performed without the decryption of the encrypted message. However, the bottleneck of designing homomorphic encryption hardware is the complexity of computation. To tackle the long integer arithmetic, the residue number system based on the Chinese remainder theorem is used. In this paper, we propose a novel modular reduction architecture that computes the mapping of residual polynomials in parallel with high speed and low latency. We implement our proposed design in the Xilinx Ultrascale+ FPGA board (VCU118). When the input sizes are 360-bit (1440-bit), the frequency is 180MHz (168MHz) with 4 pipelining stages. Also, the area delay product (ADP) of DSP blocks of our design is reduced by 23 and 31 percent, respectively, for 360 and 1440 bits, compared to prior work. 
    more » « less
    Free, publicly-accessible full text available December 19, 2024
  6. Graph Neural Networks (GNNs) are a form of deep learning that have found use for a variety of problems, including the modeling of drug interactions, time-series analysis, and traffic prediction. They represent the problem using non-Euclidian graphs, allowing for a high degree of versatility, and are able to learn complex relationships by iteratively aggregating more contextual information from neighbors that are farther away. Inspired by its power in transformers, Graph Attention Networks (GATs) incorporate an attention mechanism on top of graph aggregation. GATs are considered the state of the art due to their superior performance. To learn the best parameters for a given graph problem, GATs use traditional backpropagation to compute weight updates. To the best of our knowledge, these updates are calculated in software, and closed-form equations describing their calculation for GATs aren’t well known. This paper derives closed-form equations for backpropagation in GATs using matrix notation. These equations can form the basis for design of hardware accelerators for training GATs. 
    more » « less
    Free, publicly-accessible full text available October 16, 2024
  7. CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints. Specifically, matrix-vector multiplication and number theoretic transform (NTT)-based polynomial multiplication are critical operations and bottle-necks that require optimization. To address this challenge, we propose an algorithm and hardware co-design approach to systematically optimize matrix-vector multiplication and NTT-based polynomial multiplication by employing a novel sub-structure sharing technique in order to reduce computational complexity, i.e., the number of modular multiplications and modular additions/subtractions consumed. The sub-structure sharing approach is inspired by prior fast parallel approaches based on polyphase decomposition. The proposed efficient feed-forward architecture achieves high speed, low latency, and full utilization of all hardware components, which can significantly enhance the overall efficiency of the Kyber scheme. The FPGA implementation results show that our proposed design, using the fast two-parallel structure, leads to an approximate reduction of 90% in execution time (μs) , along with a 66× improvement in throughput performance. 
    more » « less
    Free, publicly-accessible full text available October 28, 2024
  8. Hyperdimensional computing (HDC) has drawn significant attention due to its comparable performance with traditional machine learning techniques. HDC classifiers achieve high parallelism, consume less power, and are well-suited for edge applications. Encoding approaches such as record-based encoding and N -gram-based encoding have been used to generate features from input signals and images. These features are mapped to hypervectors and are input to HDC classifiers. This paper considers the group-based classification of graphs constructed from time series. The graph is encoded to a hypervector and the graph hypervectors are used to train the HDC classifier. This paper applies HDC to brain graph classification using fMRI data. Both the record-based encoding and GrapHD encoding are explored. Experimental results show that 1) For HDC encoding approaches, GrapHD encoding can achieve comparable classification performance and require significantly less memory storage compared to record-based encoding. 2) The utilization of sparsity can achieve higher performance as compared to fully connected brain graphs. Both threshold strategy and the minimum redundancy maximum relevance (mRMR) algorithm are employed to generate sub-graphs, where mRMR achieves higher performance for three binary classification problems: emotion vs. gambling, emotion vs. no-task, and gambling vs. no-task. The corresponding AUCs are 0.87, 0.88, and 0.88, respectively. 
    more » « less
    Free, publicly-accessible full text available October 29, 2024
  9. Hyperdimensional computing (HDC) has been assumed to be attractive for time-series classification. These classifiers are ideal for one or few-shot learning and require fewer resources. These classifiers have been demonstrated to be useful in seizure detection. This paper investigates subject-specific seizure prediction using HDC from intracranial elec-troencephalogram (iEEG) from the publicly available Kaggle dataset. In comparison to seizure detection (interictal vs. ictal), seizure prediction (interictal vs. preictal) is a more challenging problem. Two HDC-based encoding strategies are explored: local binary pattern (LBP) and power spectral density (PSD). The average performance of HDC classifiers using the two encoding approaches is computed using the leave-one-seizure-out cross-validation method. Experimental results show that the PSD method using a small number of features selected by the minimum redundancy maximum relevance (mRMR) achieves better seizure prediction performance than the LBP method on the trairring and validation data. 
    more » « less
    Free, publicly-accessible full text available August 6, 2024
  10. Modern neural networks have revolutionized the fields of computer vision (CV) and Natural Language Processing (NLP). They are widely used for solving complex CV tasks and NLP tasks such as image classification, image generation, and machine translation. Most state-of-the-art neural networks are over-parameterized and require a high computational cost. One straightforward solution is to replace the layers of the networks with their low-rank tensor approximations using different tensor decomposition methods. This article reviews six tensor decomposition methods and illustrates their ability to compress model parameters of convolutional neural networks (CNNs), recurrent neural networks (RNNs) and Transformers. The accuracy of some compressed models can be higher than the original versions. Evaluations indicate that tensor decompositions can achieve significant reductions in model size, run-time and energy consumption, and are well suited for implementing neural networks on edge devices. 
    more » « less